340 research outputs found

    A Network Topology Approach to Bot Classification

    Full text link
    Automated social agents, or bots, are increasingly becoming a problem on social media platforms. There is a growing body of literature and multiple tools to aid in the detection of such agents on online social networking platforms. We propose that the social network topology of a user would be sufficient to determine whether the user is a automated agent or a human. To test this, we use a publicly available dataset containing users on Twitter labelled as either automated social agent or human. Using an unsupervised machine learning approach, we obtain a detection accuracy rate of 70%

    Belief Hierarchical Clustering

    Get PDF
    In the data mining field many clustering methods have been proposed, yet standard versions do not take into account uncertain databases. This paper deals with a new approach to cluster uncertain data by using a hierarchical clustering defined within the belief function framework. The main objective of the belief hierarchical clustering is to allow an object to belong to one or several clusters. To each belonging, a degree of belief is associated, and clusters are combined based on the pignistic properties. Experiments with real uncertain data show that our proposed method can be considered as a propitious tool

    Tuning the Level of Concurrency in Software Transactional Memory: An Overview of Recent Analytical, Machine Learning and Mixed Approaches

    Get PDF
    Synchronization transparency offered by Software Transactional Memory (STM) must not come at the expense of run-time efficiency, thus demanding from the STM-designer the inclusion of mechanisms properly oriented to performance and other quality indexes. Particularly, one core issue to cope with in STM is related to exploiting parallelism while also avoiding thrashing phenomena due to excessive transaction rollbacks, caused by excessively high levels of contention on logical resources, namely concurrently accessed data portions. A means to address run-time efficiency consists in dynamically determining the best-suited level of concurrency (number of threads) to be employed for running the application (or specific application phases) on top of the STM layer. For too low levels of concurrency, parallelism can be hampered. Conversely, over-dimensioning the concurrency level may give rise to the aforementioned thrashing phenomena caused by excessive data contention—an aspect which has reflections also on the side of reduced energy-efficiency. In this chapter we overview a set of recent techniques aimed at building “application-specific” performance models that can be exploited to dynamically tune the level of concurrency to the best-suited value. Although they share some base concepts while modeling the system performance vs the degree of concurrency, these techniques rely on disparate methods, such as machine learning or analytic methods (or combinations of the two), and achieve different tradeoffs in terms of the relation between the precision of the performance model and the latency for model instantiation. Implications of the different tradeoffs in real-life scenarios are also discussed

    An augmented Lagrangian decomposition method for quasi-separable problems in MDO

    Get PDF
    Several decomposition methods have been proposed for the distributed optimal design of quasi-separable problems encountered in Multidisciplinary Design Optimization (MDO). Some of these methods are known to have numerical convergence difficulties that can be explained theoretically. We propose a new decomposition algorithm for quasi-separable MDO problems. In particular, we propose a decomposed problem formulation based on the augmented Lagrangian penalty function and the block coordinate descent algorithm. The proposed solution algorithm consists of inner and outer loops. In the outer loop, the augmented Lagrangian penalty parameters are updated. In the inner loop, our method alternates between solving an optimization master problem, and solving disciplinary optimization subproblems. The coordinating master problem can be solved analytically; the disciplinary subproblems can be solved using commonly available gradient-based optimization algorithms. The augmented Lagrangian decomposition method is derived such that existing proofs can be used to show convergence of the decomposition algorithm to KKT points of the original problem under mild assumptions. We investigate the numerical performance of the proposed method on two example problems. I

    Noise-robust method for image segmentation

    Get PDF
    Segmentation of noisy images is one of the most challenging problems in image analysis and any improvement of segmentation methods can highly influence the performance of many image processing applications. In automated image segmentation, the fuzzy c-means (FCM) clustering has been widely used because of its ability to model uncertainty within the data, applicability to multi-modal data and fairly robust behaviour. However, the standard FCM algorithm does not consider any information about the spatial linage context and is highly sensitive to noise and other imaging artefacts. Considering above mentioned problems, we developed a new FCM-based approach for the noise-robust fuzzy clustering and we present it in this paper. In this new iterative algorithm we incorporated both spatial and feature space information into the similarity measure and the membership function. We considered that spatial information depends on the relative location and features of the neighbouring pixels. The performance of the proposed algorithm is tested on synthetic image with different noise levels and real images. Experimental quantitative and qualitative segmentation results show that our method efficiently preserves the homogeneity of the regions and is more robust to noise than other FCM-based methods

    Identification of differentially expressed small non-coding RNAs in the legume endosymbiont Sinorhizobium meliloti by comparative genomics

    Get PDF
    Bacterial small non-coding RNAs (sRNAs) are being recognized as novel widespread regulators of gene expression in response to environmental signals. Here, we present the first search for sRNA-encoding genes in the nitrogen-fixing endosymbiont Sinorhizobium meliloti, performed by a genome-wide computational analysis of its intergenic regions. Comparative sequence data from eight related α-proteobacteria were obtained, and the interspecies pairwise alignments were scored with the programs eQRNA and RNAz as complementary predictive tools to identify conserved and stable secondary structures corresponding to putative non-coding RNAs. Northern experiments confirmed that eight of the predicted loci, selected among the original 32 candidates as most probable sRNA genes, expressed small transcripts. This result supports the combined use of eQRNA and RNAz as a robust strategy to identify novel sRNAs in bacteria. Furthermore, seven of the transcripts accumulated differentially in free-living and symbiotic conditions. Experimental mapping of the 5′-ends of the detected transcripts revealed that their encoding genes are organized in autonomous transcription units with recognizable promoter and, in most cases, termination signatures. These findings suggest novel regulatory functions for sRNAs related to the interactions of α-proteobacteria with their eukaryotic hosts

    Fuzzy cluster validation using the partition negentropy criterion

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-04277-5_24Proceedings of the 19th International Conference, Limassol, Cyprus, September 14-17, 2009We introduce the Partition Negentropy Criterion (PNC) for cluster validation. It is a cluster validity index that rewards the average normality of the clusters, measured by means of the negentropy, and penalizes the overlap, measured by the partition entropy. The PNC is aimed at finding well separated clusters whose shape is approximately Gaussian. We use the new index to validate fuzzy partitions in a set of synthetic clustering problems, and compare the results to those obtained by the AIC, BIC and ICL criteria. The partitions are obtained by fitting a Gaussian Mixture Model to the data using the EM algorithm. We show that, when the real clusters are normally distributed, all the criteria are able to correctly assess the number of components, with AIC and BIC allowing a higher cluster overlap. However, when the real cluster distributions are not Gaussian (i.e. the distribution assumed by the mixture model) the PNC outperforms the other indices, being able to correctly evaluate the number of clusters while the other criteria (specially AIC and BIC) tend to overestimate it.This work has been partially supported with funds from MEC BFU2006-07902/BFI, CAM S-SEM-0255-2006 and CAM/UAM project CCG08-UAM/TIC-442

    Intelligent Visual Descriptor Extraction from Video Sequences

    Full text link

    Optimization clustering techniques on register unemployment data

    Get PDF
    An important strategy for data classification consists in organising data points in clusters. The k-means is a traditional optimisation method applied to cluster data points. Using a labour market database, aiming the segmentation of this market taking into account the heterogeneity resulting from different unemployment characteristics observed along the Portuguese geographical space, we suggest the application of an alternative method based on the computation of the dominant eigenvalue of a matrix related with the distance among data points. This approach presents results consistent with the results obtained by the k-means.info:eu-repo/semantics/publishedVersio
    corecore